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Abstract 

A decision maker records measurements of a finite-state Markov chain corrupted by noise. The 
goal is to decide when the Markov chain hits a specific target state. The decision maker can choose 
from a finite set of sampling intervals to pick the next time to look at the Markov chain. The aim is to 
optimize an objective comprising of false alarm, delay cost and cumulative measurement sampling cost. 
Taking more frequent measurements yields accurate estimates but incurs a higher measurement cost. 
Making an erroneous decision too soon incurs a false alarm penalty. Waiting too long to declare the 
target state incurs a delay penalty. What is the optimal sequential strategy for the decision maker? The 
paper shows that under reasonable conditions, the optimal strategy has the following intuitive structure: 
when the Bayesian estimate (posterior distribution) of the Markov chain is away from the target state, 
look less frequently; while if the posterior is close to the target state, look more frequently. Bounds 
are derived for the optimal strategy. Also the achievable optimal cost of the sequential detector as a 
function of transition dynamics and observation distribution is analyzed. The sensitivity of the optimal 
achievable cost to parameter variations is bounded in terms of the Kullback divergence. To prove the 
results in this paper, novel stochastic dominance results on the Bayesian filtering recursion are derived. 



The formulation in this paper generalizes quickest time change detection to consider optimal sampling 
and also yields useful results in sensor scheduling (active sensing). 

Index Terms 

change detection, optimal sequential sampling, decision making, Bayesian filtering, stochastic dom- 
inance, submodularity, stochastic dynamic programming, partially observed Markov decision process 
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I. Introduction and Examples 

A. The Problem 

Consider the following quickest detection optimal sampling problem which is a special case 
of the problem considered in this paper. Let r k , k = 0, 1, . . . denote the time instants at which 
decisions to observe a noisy finite state Markov chain are made. As it accumulates measurements 
over time, a decision-maker needs to announce when the Markov chain hits a specific absorbing 
target state. At each decision time r k , the decision maker needs to pick its decision from the 
action set U = {0 (announce change), Di, D 2 , . . . , D L } where 

• Decision u k = made at time r k corresponds to "announce the target state and stop". 
When this decision is made the problem terminates at time r k with possibly a false alarm 
penalty (if the Markov chain was not in the target state). 

• Decision u k G {Di, D 2 , . . . , D L } at time r k corresponds to: "Look at noisy Markov chain 
next at time r k+1 = r k + u k ." Here D 1 < D 2 < ■ ■ ■ < D L are fixed positive integers. They 
denote the set of possible time delays to sample the Markov chain next. 

Given the history of past measurements and decisions, how should the decision-maker choose 
its decisions u7 Let t* denote the time at which the Markov chain hits that the absorbing target 
state and k* denote the time at which the decision maker announces that the Markov chain has 
hit the target state. The decision-maker considers the following costs: 

(i) False alarm penalty: If k* < t* , i.e., the Markov chain is not in the target state, but the 
decision-maker announces that the chain has hit the target state, it pays a false alarm penalty /. 

(ii) Delay penalty: If k* > t*, i.e., the Markov chain hits the target state and the decision-maker 
does not announce this, it pays a delay penalty d. The decision maker continues to pay this 
delay penalty over time until it announces the target state has been reached. 

(iii) Sampling cost: At each decision time r k , the decision maker looks at the noisy Markov 
chain and pays a measurement (sampling) cost m. 

Suppose the Markov chain starts with initial distribution n at time 0. What is the optimal 
sampling strategy p,* for the decision-maker to minimize the following combination of the false 
alarm rate, delay penalty and measurement cost? That is, determine p* = inf M J M (7r ) where 

L 

J^ ) = dE^{(k*-tT} + fK (k*<t*)+mJ2 E P "oK = «) (D 

u=l k:T^<k* 
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Here /i denotes a stationary strategy of the decision maker. W and E^ o are the probability 
measure and expectation of the evolution of the observations and Markov state which are strategy 
dependent (These are defined formally in Secjn]). Taking frequent measurements yields accurate 
estimates but incurs a higher measurement cost. Making an erroneous decision too soon incurs 
a false alarm penalty. Waiting too long to declare the target state incurs a delay penalty. 

B. Context 

In the special case when the change time t* is geometrically-distributed (equivalently, the 
Markov chain has two states), action space U = {0 (announce change), 1 (continue)}, measure- 
ment cost m = 0, then (OQ) becomes the classical Kolmogorov-Shiryayev quickest detection 
problem ll23l . Il20l . Our setup generalizes this in the following non-trivial ways: 
First, unlike quickest detection, there are now multiple "continue" actions u E {1,2, ...,L} 
corresponding to different sampling delays {_D 1? D 2 , . . . , D L }. (In quickest detection there is 
only one continue action and one stop action). Each of these "continue" actions result in different 
dynamics of the posterior distribution and incur different costs. Also, the measurement costs can 
be state and action dependent. 

Second, allowing for the underlying Markov chain to have multiple states facilitates modelling 
general phase-distributed (PH-distributed) change times (compared to two state Markov chains 
that model geometric distributed change times). As described in |fT8ll . a PH-distributed change 
time can be modelled as a multi-state Markov chain with an absorbing state. The optimal 
detection of a PH-distributed change point is useful since PH-distributions form a dense subset 
for the set of all distributions; see ifTTl for quickest detection with PH-distributed change times. 

C. Main Results, Organization and Related Works 

This paper analyzes the structure of the optimal sampling strategy of the decision-maker. 
The problem is an instance of a partially observed Markov decision process (POMDP) 0. In 
general, solving POMDPs and therefore determining the optimal strategy is computationally 
intractable (PSPACE hard |fT9l ). However, returning to the example considered above, intu- 
ition suggests that the following strategy would be sensible (recall that the action set U = 
{0 (announce change), Di, D 2 , . . . , D L })\ 
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• If the Bayesian posterior distribution estimate of the Markov chain (given past observations 
and decisions) is away from the target state, look infrequently at the noisy Markov chain, 
i.e., pick a large sampling interval D u . Since we are interested in detecting when the Markov 
chain hits the target state, there is little point in incurring a measurement cost by looking 
at the Markov chain when its estimate suggests that it is far away from the target state. 

• If the posterior distribution is close to the target state, then pay a higher sampling cost and 
look more frequently at the noisy Markov chain, i.e., pick a small sampling interval D u . 

• If the posterior is sufficiently close to the target state, then announce the target state has 
been reached, i.e., choose action u — 0. 

The key point is that such a strategy (choice of sampling interval D u ) is monotonically decreasing 
as the posterior distribution gets closer to the target state. By using stochastic dominance and 
lattice programming analysis, this paper shows that under reasonable conditions, the optimal 
sampling strategy always has this monotone structure. Lattice programming was championed by 
G51 and provides a general set of sufficient conditions for the existence of monotone strategies in 
stochastic control problems. This area falls under the general umbrella of monotone comparative 
statics that has witnessed remarkable interest in the area of economics 0. Our results apply to 
general observation distributions (Gaussians, exponentials, Markov modulated Poisson, discrete 
memoryless channels, etc) and multi-state Markov chains. 

In more detail, this paper establishes the following structural results: 
(i) For two-state Markov chains observed in noise, since the elements of the two-dimensional 
posterior probability mass function add to 1, it suffices to consider one element of this posterior 
- this element is a probability and lies in the interval [0, 1]. Theorems Q] and [2] show that 
under reasonable conditions the optimal sampling strategy of the decision-maker has a monotone 
structure in the posterior distribution. The monotone structure of Theorem Q] reduces a function 
space optimization problem (dynamic programming on the space of posterior distributions) to a 
finite dimensional optimization - since a monotone strategy with L possible actions has at most 
L — 1 thresholds in the space of posterior distributions. The threshold values can be estimated via 
simulation based stochastic approximation. The monotone structure holds even for large delay 
penalty and measurement cost that is independent of the state. If satisfaction is viewed as the 
number of times the decision maker looks at the Markov chain, Theorems \T\ and [2] say that 
"delayed satisfaction" is optimal. These theorems also directly apply to a measurement control 
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model recently developed in [3] as will be discussed in Sec Hill 

(ii) For general-state Markov chains (which can model PH-distributed change times) observed in 
noise, the posterior lies in a X — 1 dimensional unit simplex. Theorem |4] shows that the optimal 
decision-maker's sampling strategy can be under-bounded by a judiciously chosen myopic strat- 
egy on the unit simplex of posterior distributions. Therefore the myopic strategy forms an easily 
computable rigorous lower bound to the optimal strategy. Sufficient conditions are given for 
the myopic strategy to have a monotone structure with respect to the monotone likelihood ratio 
stochastic order on the simplex. Theorem [5] illustrates the result for quickest detection problems. 

(iii) How does the optimal expected sampling cost vary with transition matrix and noise dis- 
tribution? Is it possible to order these parameters such that the larger they are, the larger the 
optimal sampling cost? Such a result would allow us to compare the optimal performance of 
different sampling models, even though computing these is intractable For general-state Markov 
chains observed in noise, Theorem [6] examines how the cost achieved by the optimal sampling 
strategy varies with transition matrix (state dynamics) and observation matrix (noise distribu- 
tion). In particular dominance measures are introduced for the transition matrix and observation 
distribution (Blackwell dominance) that result in the optimal cost increasing with respect to this 
dominance order. Theorem [6] shows that for optimal sampling problems, certain PH-distributions 
for the change time result in larger total optimal cost compared to other distributions. 

(iv) Theorem [7J derives sensitivity bounds on the total cost for optimal sampling with a mis- 
matched model. That is, when the optimal strategy computed for a specific sampling model 
is used for a different sampling model, Theorem [7J gives an explicit bound on the performance 
degradation. In particular, by elementary use of the Pinsker inequality J6l, Theorem [7J shows that 
the sensitivity is a linear function of the Kullback-Leibler divergence between the two models. 
Also, the bounds are tight in the sense that if the difference between the two models goes to 
zero, so does the performance degradation. 

(v) To prove the above results, several important stochastic dominance properties of the Bayesian 
filter are presented in Theorem [9] How does the posterior distribution computed by the Bayesian 
filter vary with observation, prior, transition matrix and observation matrix? Is it possible to order 
these so that the posterior distribution increases with respect to this ordering? These results are of 
independent interest. The theorem gives sufficient conditions for the Bayesian filtering recursion 
to preserves the MLR (monotone likelihood ratio) stochastic order, and for the normalization 
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measure to be submodular. It also shows that if starting with two different transition matrices 
but identical priors, then the optimal predictor with the larger transition matrix (in terms of the 
order introduced in (1291 )) MLR dominates the predictor with the smaller transition matrix. 

Related Works: In this paper we consider sampling control with change detection. A related 
problem is measurement control where at each time the decision is made whether to take a 
measurement or not. This is the subject of the recent papeiQ [3] which considers geometric- 
distributed change times (2-state Markov chain). The problem in can be formulated in terms 
of our optimal sampling problem. We discuss this further in Sec lIII-Al 

We also refer to the seminal work of Moustakides (see E71 and references therein) in event 
triggered sampling. Quickest detection has been studied widely, see ll20l . E4l and references 
therein. We have considered recently a POMDP approach to quickest detection with social 
learning 021 and non-linear penalties |[TT| and phase-distributed change times. However, in 
these papers, there is only one continue and one stop action. The results in the current paper are 
considerably more general due to the propagation of different dynamics for the multiple continue 
actions. A useful feature of the lattice programming approach [[TJ, |[T6ll . ||2"T| used in this paper is 
that the results apply to general observation noise distributions (Gaussians, exponentials, discrete 
memoryless channels) and multiple state Markov chains. Also, the results proved here are valid 
for finite sample sizes and no asymptotic approximations in signal to noise ratio are used. 

D. Examples: Change Detection and Sensor Scheduling 

Several examples in statistical signal processing are special cases of the above measurement- 
sampling control model. The terms active/smart/cognitive sensing imply the use of feedback of 
previous estimates and decisions to choose the current optimal decision. 

Example 1. Quickest Time Change Detection with Optimal Sampling: Return to the problem 
considered at the beginning of this section. The action space is {0 (announce change), D\ = 
1,D 2 = 3,D 3 = 5, D A = 10}. That is, at each decision time, the decision maker has the 
option of either stopping or looking at a 2-state Markov chain every 1, 3, 5 or 10 time points. 
Suppose the decision maker observes the underlying Markov chain via a binary erasure channel 
(parameters values are specified in SecfVTl). 

'The author is very grateful to Dr. Venu Veeravalli of U. Illinois Urbana Champaign for sharing the results in and several 
useful discussions 
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Fig. 1. Optimal sampling strategy /x*(7r) for action space !i£{0 (announce change) , Di = 1,D2 = 3, D3 = 5, D4 = 10} 
for a quickest-change detection problem with geometric change time. The noisy observations are from a binary erasure channel 
and the parameters are specified in Example 1 of Sec lVIl Figure[TJa) depicts a monotone decreasing optimal strategy in posterior 
7r(l). Theorem Q] gives sufficient conditions under which the optimal sampling strategy M*(vr) has this structure. The threshold 
values 7Ti , 7T2 , 7r| , 7rl give a finite dimensional characterization of the optimal strategy. FigQJb) gives an example where the 
conditions of Theorem Q] are violated and the optimal strategy is no longer monotone in 7r(l). 

Theorem \T\ shows that the optimal strategy has a monotone structure in posterior 7r(l) depicted 
in Figure QIa). The horizontal axis in Figure \V[a) denotes the Bayesian posterior 7r(l) while 
the vertical axis denotes the optimal action taken. Therefore, when the posterior is less than 
n\, it is optimal to look every 10 time points at the noisy Markov chain, for posterior in 
the interval [^4,^3] look every 5 points at the noisy Markov chain, etc. Thus one only needs 
to compute/estimate the threshold values 7rJ, ir%, ir^, k% to determine the optimal strategy. The 
usefulness of Theorem \T\ is further enhanced by noting that in general (without introducing 
conditions) the optimal strategy does not have this property. Figure |TJb) gives an example 
where the sufficient conditions of Theorem \T\ are violated and the optimal strategy is no longer 
monotone. 

Example 2. Sensor Measurement Scheduling: In sensor and radar resource management prob- 
lems, the sensor is a resource that needs to be allocated amongst several targets ifTOl . [[141 . 
Deploying a sensor to look at a target consumes sensor resources. How should a sensor scheduler 
decide how often to look at a target in order to detect if the target has made a sudden maneuver 
(modelled by the Markov chain jumping to a target state)? In radar resource management [TT3l 
this is called the revisit time problem. 
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II. Formulation of Optimal Sampling Problem 
Let t = 0, 1, . . . denote discrete time and x t denote a Markov chain on the finite state space 

{ei, . . . ,ex} where is the X-dimensional unit vector with 1 in the z-th position. (2) 

Here state '1' (corresponding to e\) is labelled as the "target state". Denote 

X={1,2,...,X}. (3) 

Denote X x X transition probability matrix A and the X x 1 initial distribution 7r where 

A = (Aij,i,j G X), Aij = P(x t+ i = eAxt = e*), n = (ir Q (i),i G X), 7T (i) = P(x = d). 

(4) 

A. Measurement Sampling Protocol 

Let 7i, . . . , Tfe_i denote previous discrete time instants at which measurement samples were 
taken. Let r k denote the current time-instant at which a measurement is taken. The measurement 
sampling protocol proceeds according to the following steps: 

Step 1. Observation: A noisy measurement y k 6 Y at time r k of the Markov chain is obtained 
with conditional probability distribution 

P(Vk < y\x Tk = e x ) = B xy , x G X (5) 

y<y 

Here J2 y denotes integration with respect to the Lebesgue measure (in which case YcK and B xy 
is the conditional probability density function) or counting measure (in which case Y is a subset 
of the integers and B xy is the conditional probability mass function B xy = P(y k = y\x Tk = e x )). 

Step 2. Sequential Decision Making: Denote the filtration generated by measurements and past 
decisions (denoted u±, . . . , Uk-i) as 

P k = o--algebra generated by (y u . . . , y k , m, . . . , u k -i). (6) 

At time r k , a P k measurable decision w fc G U is taken where action 

u k = /i(J-fc) G U = {0 (announce change), 1, 2, . . . , L} (7) 
and u k = I denotes: obtain next measurement after D t time points, I G {1,2, ... , L}. 
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In C7]), the strategy fi belongs to the class of stationary decision strategies denoted /x. Also, 
Z?i, . . . , D L are distinct positive integers that denote the set of possible sampling time intervals. 
Thus the decision u k specifies the next time r fc+1 to make a measurement as follows: 

r k+1 = T k + D Uk , u k £ {1,2,...,L}. (8) 

Step 3. Costs: Associated with the decision u k £ U, a cost c(x t , u k ) is incurred by the decision- 
maker at each time t £ [rfc, . . . , r k+ i — 1] until the next measurement is taken at time r k . Also 
a non-negative measurement sampling cost m(x Tk ,u k ) is incurred. 

Step 4: If u k = 1, the problem terminates, else set k to A; + 1 and go to Step 1. ■ 

Belief State Formulation: It is convenient to re-express Step 2 of the above protocol in terms 
of the belief state. It is well known from elementary stochastic control lfT6l that the belief 
state (posterior) constitutes a sufficient statistic for T k in <[6]). Denote the belief state as ix k = 
E{x Tfc | J-fc}. Since the state space © comprises of unit indicator vectors, conditional probabilities 
and conditional expectations coincide. So 

vr fc = (7Tfc(i), i £ X) where 7r fc (z) = P(x Tk = e^yx, . . . , y k , u u . . . , u k -i), initialized by tt . (9) 

It is easily proved that the belief state is updated via the Bayesian (Hidden Markov Model) filter 

B (A') Du n 

vr fc = T(7r fc _i, y k , u fc _i), where T(tt, y, u) = yK a(-K, y, u) = l' x B y (A') Du n (10) 

o(ix,y,u) 

B y = dmg(P(y\x) 1 xeX). 

Here cr(ir,y,u) is the normalization measure of the Bayesian update with J2 y a ( 7r ^ Vi u ) = 1- 
Also lx denotes the X dimensional vector of ones. Note that n in (flOl) is an X-dimensional 
probability vector. It belongs to the X — 1 dimensional unit-simplex denoted as 

U(X) = {n £ R x : l' x n = I, < < 1 for alH £ X} (11) 

For example, 11(2) is a one dimensional simplex (unit line segment), 11(3) is a two-dimensional 
simplex (equilateral triangle); 11(4) is a tetrahedron, etc. Note that the unit vector states ei, e 2 , . . . , ex 
defined in © of the Markov chain x are the vertices of II (X). 

Step 2 in the above protocol expressed in terms of the belief state reads: At decision time r k 

• Step 2(a). Update belief state n k according to Bayesian filter (flOl) 
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Step 2(b). Make decision u k E U using stationary strategy ji as (see ©) 
Uk = fJ>(nk) EU = {0 (announce change), 1,2,..., L}. 



(12) 



B. Sequential Decision-maker's Objective and Stochastic Dynamic Programming 

Given the above protocol with measurement-sampling strategy /i in (PT21) . we now define the 
objective of the sequential decision maker. Let J 7 ) be the underlying measurable space where 
f2 = (X x U x Y)°° is the product space, which is endowed with the product topology and T is 
the corresponding product sigma-algebra. For any 7i E n(X), and strategy /i e /x, there exists 
a (unique) probability measure P^ o on J 7 ), see for details. Let E^ o denote the expectation 
with respect to the measure P^ Q . 

Define the {J-'k, k > 1} measurable stopping time k* as 



k* = {inf k : Uk = (announce target state and stop) }. 



(13) 



That is, k* is the time at which the decision maker declares the target state has been reached 
and the problem terminates. For each initial distribution n E n(X), and strategy fi, the decision 
maker's global objective function is 

Tfc+l— 1 



•fc*-l 



W = K E 



fc=l 



m(x Tk ,u k ) + ^ c ( x t,u k ) 



t=T~k 



+ c(x Tk *,u k *) 



(14) 



Recall that c(x, u) and measurement sampling cost m(x, u) are defined in Step 3 of the protocol. 
Using the smoothing property of conditional expectations, (fl4l) can be expressed in terms of the 
belief state ir as 



"k*-l 



J^TTo) = E£ Q <^ Yl C( ^k,U k ) + C(7T k *,U k * = 0) 



(15) 



k=l 



where C(tt, u) = C' u ir for u EU, and the X-dimensional cost vector C u is 



Co 



c(ei,0),...,c(e x ,0) 



C u = m u + (/ + A + ■ 



A 



D u -1- 



c u for u E {1,2,...,L}, 



c(ei,«), 



c(e x ,u 



m(ei, w), . . .,m(e x ,u) 



The decision-maker aims to determine the optimal strategy fi* E fj, to minimize (U5l) . i.e. 



^u*(7T ) = inf J M (7T ). 



(16) 
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The existence of an optimal stationary strategy /i* follows from [4, Prop. 1.3, Chapter 3]. 

Considering the global objective (fl"5l) . the optimal stationary strategy pJ* : II (X) — > U and 
associated optimal objective J fL *(n) are the solution of the following "Bellman's stochastic 
dynamic programming equation" 

/x*(7r) = argminQ(7r,M), J M *(7r) = V(tt) = mmQ(ir,u), (17) 
ueu u<=u 

where Q(ir, u) = C(ir, u) + ^ V (T(tt, y, u)) a(ir, y,u), u — 1, . . . , L, Q(vr, 0) = C(ir, 0). 

Recall T(n,y,u) and a{ji,y,u) were defined in (flOl) . The above formulation is a generaliza- 
tion of a partially observed Markov decision process (POMDP), since POMDPs assume finite 
observations spaces Y while in our formulation Y can be discrete or continuous (see ©). 
Define the set of belief states where it is optimal to apply action u = as 

S = {n e n(X) : /x*(tt) = 0} = {tt G U(X) : Q(vr, 0) < Q(vr, u), u G {1,2,..., L}} (18) 

S is called the stopping set since it is the set of belief states to "declare target state and stop". 

Since the belief state space II(X) is an uncountable, Bellman's equation (flTT) does not translate 
directly into numerical algorithms. However, in subsequent sections, we exploit the structure of 
Bellman's equation to prove various structural results about the optimal strategy ji* using lattice 
programming and stochastic dominance tools. 

C. Example: Quickest Change Detection with Measurement Control 

We now formulate the quickest detection problem with optimal sampling - this serves as a 
useful example to illustrate the above general model. Before proceeding, it is important to recall 
that in our model, decisions (whether to stop, or continue and take next observation sample after 

D\ time points) are made at times n, T2, In contrast, the state of the Markov chain (which 

models the change we want to detect) can change at any time t. We need to construct the delay 
penalty and false alarm penalties carefully to take this into account. 

1. Phase-Distributed (PH) Change time: In quickest detection, the target state (which we label 
as state 1 by convention) is an absorbing state. States 2, . . . , X (corresponding to unit vectors 
62, • • • , ex) are now fictitious states that form a single composite state that the Markov chain x t 
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resides in before jumping into the target absorbing state. So the transition matrix © is 

1 



.4 



A(X-l)xl A(x-l)x(X-l) 



(19) 



The "change time" t* denotes the time at which x t enters the absorbing state 1, i.e., 

t* =inf{t > : x t = 1}. (20) 

Of course, in the special case when x is a 2-state Markov chain (i.e., X = 2), the change time 
t* in (|2Q|) is geometrically distributed. 

For the multi-state case, to ensure that t* is finite, assume states 2,3, . . .X are transient. This 
is equivalent to A in (fT9l ) satisfying Yl™=i ^-u < 00 f° r * = 1, • • • — 1 (where denotes 
the (i, i) element of the n-th power of matrix A). With the transition probabilities (fl9l ), the 
distribution of the change time t* is given by the PH-distribution 

P(t* = 0) = 7T (1), P(t* = t) = Tt'oA^A, t > 1 (21) 

where n = [7T (2), . . . , n Q (X)]'. By choosing (ir ,A) and state space dimension X, one can 
approximate any given change-time distribution on [0, oo) by PH-distribution (1211) ; see lfi~8l 
pp. 240-243]. Indeed, PH-distributions form a dense subset for the set of all distributions. 

2. Observations: Since states 2,3, .... ,X are fictitious states that shape the PH-distributed 
change time (l2D) . they are indistinguishable in terms of the observation y. That is, B 2y = 
B 3y = ■ ■ ■ = B Xy for all y e Y. 

3. Costs: Associated with the quickest detection problem are the following costs. 

(i) False Alarm: Let k* denote the time r k at which decision u k = (stop and announce target 
state) is chosen, so that the problem terminates. If the decision to stop is made before the Markov 
chain reaches the target state 1, i.e., k* < t*, then a false alarm penalty / is paid. So the false 
alarm penalty is / H x r k — ^i,u k = 1) where / is a user defined non-negative constant. 
The expected false alarm penalty based on the accumulated history is 

J2 /E{/(x Tfc = e u u k = l)\F k } = f(l x - e 1 )V fe J( Wfe = 1). (22) 
Recall lx denotes the X-dimensional vector of ones. 

(ii) Delay cost of continuing: Suppose decision u k E {1, 2, . . . , L} is taken at time r k . So the next 
sampling time is r k+1 = r k + D Uk . Then for any time t G [r k , r fc+1 — 1], the event {x t = e±, u k } 
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signifies that a change has occurred but not been announced by the decision maker. Since the 
decision maker can make the next decision (to stop or continue) at r k+1 , the delay cost incurred 
in the time interval [r k , r k+1 — 1] is d Ylt^r 1 1( x t = e i> u k) where d is a non-negative constant. 
The expected delay cost in interval [r fc , r k+1 — 1] = [r k , r k + D Uk — 1] is 

d W(xt = e l ,u k )\F k } = de' 1 (I + A + --- + A D ^~ 1 ) , ir k , u k E {1, 2, . . . , L). (23) 

t=T k 

(iii) Measurement Sampling Cost: Suppose decision u k E {1, 2, . . . , L} is taken at time r k . As 
in (fl31) let m Ufc = (m(x Tk = ei,u k ),i E X) denote the non-negative measurement cost vector for 
choosing to take a measurement. For convenience, assume the measurement cost when choosing 
u — (stop) is zero. Next, since in quickest detection, states 2, . . . , X are fictitious states that 
are indistinguishable in terms of cost, choose m(e 2 ,u) = . . . = m(ex,u). 
Examples of measurement sampling costs are: 

(a) m(ej, u) is independent of state i and action u. This simple choice of a constant measurement 
cost at each time, still results in non-trivial global costs for the decision maker since this cost 
is incurred each time a measurement is made - so choosing a decision u with smaller sampling 
delay will result in more measurements until the final decision to stop, thereby incurring a higher 
total measurement cost for the global decision maker. 

(b) m(ej, u) is decreasing in u for m^O. Choosing m(ej, u) to decrease in u penalizes choosing 
small sampling intervals even more than a constant cost. 

Summary and Kolmogorov-Shiryayev criterion: To summarize, the costs C(ir, u) for quick- 
est detection with optimal sampling and PH-distributed change time are 

C(tt, 0) = f(l x - ei )V, C(tt, u) = c> + m>, for u E {1, 2, ... , L}, 

where c u = d(I + A H h A D " -1 )ei. (24) 

For constant measurement cost m(a;, u) = m, « G {1, 2, . . . , L}, the quickest detection optimal 
sampling objective (TT5T) with costs (l24l) can be expressed as 

L 

J,(n )=dE>: {(k*-tY} + fK (k* <t*)+mJ2 E P " K = ^) (25) 

where the PH-distributed change time t* and fc* are defined in (l20l) . (TT3l) . For the special case 
W = {0 (stop),Di = 1}, measurement cost m u = 0, geometrically distributed t* (so X = 2), 
then (|25l ) becomes the Kolmogorov-Shiryayev criterion for detection of disorder Il22l . 
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III. Structural Results for Optimal Sampling Policy fi*(n) for 2-state case 

This section analyzes the structure of the optimal sampling strategy /i*(7r) (solution of Bell- 
man's equation (flTT) ) for two-state Markov chains (X = 2). Recall that two-state Markov chains 
model geometric distributed change times in quickest detection problems. 

We list the following assumptions that will be used in this section. 
(Al) (i) The costs C(ei,u) in (fl"5l) are increasing with i 6 X for each 

(ii) The target state e\ belongs to the stopping set S defined in (fl"8l) . 
(A2) The transition matrix A is totally positive of order 2 (TP2). That is, all second order minors 

are non-negative. 
(A3) The observation matrix B is TP2. 

(A4) C(ei, u) is submodular for u £ {1,2,..., L}, that is C(ej, u+ 1) — C(ej, u) is decreasing^ 
in i 6 X. 

Consider the following assumption where A Du \ij denotes the element of matrix A Du : 
(A5-(i)) For each q e X, Y^j> q A Du \ij is submodular. That is, J2j> q A Du+1 \^ - A Du \ i:j is 

decreasing in i G X for u = 1, 2, . . . , L — 1. 
(A5-(ii)) A Du \ 2 2 and A Du \ 12 is decreasing in a G {1,2,..., L}. 

These assumptions are discussed below in Sec lIII-Bl and hold for quickest detection problems. 

A. Optimality of Threshold Policy for Sequential Optimal Sampling 

Note that for a 2-state Markov chain (X = 2), the belief state space n(X) is the one 
dimensional simplex 7r(l) + 7r(2) = 1. So it suffices to represent n by its first element 7r(l). 



Theorem 1: Consider the optimal sampling problem of Sec|II]with state dimension X = 2 
and action space U ©. Then the optimal strategy /i*(7r) in (fT71) has the following structure: 

(i) The optimal stopping set S (TT8T) is a convex subset of n(X). Therefore under (AO), the 
stopping set is the interval S = (ir^, 1] where the threshold n* E [0, 1]. 

(ii) Under assumptions (A1-A5), the optimal sampling strategy /i*(vr) in ( fTTT ) is decreasing 



throughout this paper, we use the term "decreasing" in the weak sense. That is "decreasing" means non-increasing. Similarly, 
the term "increasing" means non-decreasing. 
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with 7r(l). Thus, there exist up to L thresholds denoted n^, . 


..,71-2 witn < 7t£ < 


< 


• • • < 7T* < 1 such that the optimal strategy satisfies 








L (sample after D L time points) 


if < tt(1) < vr2 






L — 1 (sample after D L _i time points) 


if n* L < tt(1) < 7ri_ x 




A*» = < 






(26) 




1 (sample after D\ time points) 


if TT* < 7t(1) < TT* 






(announce change ) 


if TT* < 7t(1) < 1 




where the sampling delays are ordered as D\ < D 2 < . . . < Dl- 





The proof of Theorem \T\ is in Appendix [Bj As an example, consider quickest detection with 
optimal sampling for geometric distributed change time. From ( fT9l ), the transition matrix is 
10 

A = and expected change time is E{t*} = 1 _j 4a2 where t* is defined in (1201 ). 

— ^22 ^22 



Theorem 2: Consider the quickest detection problem with optimal sampling and 
geometric-distributed change time formulated in Sec JITCI with costs defined in (l24l) . Assume 
the measurement cost m(x,u) satisfies (Al) and (A4), e.g., the measurement cost is a 
constant. Then if (A3) holds, Theorem \T\ holds. So the optimal sampling strategy (l26l) makes 
measurements less frequently when away from the target state and more frequently when 
closer to the target state. (Note, (Al)(ii), (A2), (A5) hold automatically and no assumptions 
are required on the delay or stopping costs in (l24l)). 



There are two main conclusions regarding Theorem [2l First, for constant measurement cost, 
Theorem [2] holds without any assumptions for Gaussians, exponentials, and several other classes 
of observation distributions that satisfy (A3). Second, the optimal strategy /i*(7r) is monotone in 
posterior 7r(l) and therefore has a finite dimensional characterization. To determine the optimal 
strategy, one only needs to determine (estimate) the values of the L thresholds 7r*, . . . , iz* L . These 
can be estimated via a simulation-based stochastic optimization algorithm. We will give bounds 
for these threshold values in SecjIVl FigQ] illustrates such a monotone policy. 
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A short word on the proof presented in Appendix [Bj It involves analyzing the structure of 
Bellman's equation (fT71) . It will be shown that Q(ir, u) in (TTvT) is a submodular function (defined 
in Appendix [B]) on the partially ordered set [il(X),> r ] which constitutes a lattice. Here > r 
denotes the monotone likelihood ratio stochastic order defined in Sec JV-Cl For X = 2, IT(X) 
is the unit interval [0,1] and in this case [n(X),> r ] is a chain (totally ordered set) and > r 
is equivalent to first order stochastic dominance. For X > 2 considered in the next section, a 
similar idea is used to bound the optimal policy on [n(X), > r ]. 

Remark: Interpretation of £2/. We comment here briefly on the recent paper [0 which 
considers quickest detection with measurement control where at each time the decision is made 
whether to take a measurement or not. This can be formulated as our optimal sampling problem 
by considering the action space U = {0 (announce change), Di, D 2 , D L } with sampling 
interval D, L = i and L chosen sufficiently large. In J3]|, a different action space is chosen, namely 
{0 (announce change), m (take measurement), fa (no measurement)}. With this action space, 
shows that the optimal strategy is not necessarily monotone in the posterior, n. However, with the 
action space U defined above, Theorem [2] shows that the optimal strategy indeed is monotone. 

We can interpret the non-monotone optimal strategy in [Q as follows. Our action 1 (sample 
next point) corresponds to action m in 0, action 2 (sample after 2 points) corresponds to 
(m,m), action 3 (sample after 3 points) corresponds to (m,m,m), etc. Reading off the mono- 
tone optimal strategy fi*(n) £ {3,2,1} versus 7r using the action space of yields strategy 
{m, in, m, m, m, m} which is non-monotone, due to presence of the action in sandwiched 
between two m'sP 



B. Discussion of Assumptions Al -A5 

To illustrate the assumptions of Theorem [Q we will now prove Theorem |2] by showing that 
assumptions that (A1-A5) hold. Recall from ( fl9t that for quickest detection with geometric 
change time, the transition matrix is 



A 



1 

1 — A22 A22 



So A Dv 



1 







aD u aD u 
^22 ^22 



(27) 



3 (5) also contains a very nice performance analysis of sub-optimal and nearly optimal strategies. This analysis may be 
applicable to our setup due to similarity of the models. 
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1 ) Assumption (Al )(i): This requires the elements of the cost vector to be increasing. However, 
in quickest detection, the instantaneous cost c{e^ u) for u > 1 defined in (1241) is decreasing in i if 
the measurement cost m is a constant. But all is not lost. Remarkably, a clever transformation can 
be applied to make a transformed version of the cost increase with i and yet ensure (A4) holds 
and keep the optimal strategy unchanged! This transformation is crucial for proving Theorem [2l 
particularly for constant measurement cost. (Assuming a measurement cost increasing in the state 
i makes the proof easier but may be unrealistic in applications). We define this transformation 
via the following theorem - it exploits the special structure of the quickest detection problem. 

Theorem 3: Consider the quickest time detection problem with costs defined in (124]) . Assume 
the sampling cost m(ei,u) is a constant. Define the transformed costs C(ir, u) as follows: 

C(tt,0) = C(tt,0) -aC(ir,L), 

C(tt, u) = C(tt, u) - aC(7r, L) + aY^ C(T(tt, y, u), L)a(7r, y,u), uG {1,2 L} (28) 

y 

for any constant aGl. Then: 

(i) Bellman's equation (fT71) applied to optimize the global objective (fT5l) with transformed costs 
<2(7r, u) yields the same optimal strategy as the global objective with original costs C(ir, u). 

(ii) Choosing a = 1/(1 - A®£) implies C(ei,u) satisfies (Al) and (A4). ■ 
Theorem |3] is proved in Appendix O It asserts that the transformed costs C_(ei,u) satisfies 

(Al) and (A4) even for constant measurement cost. Therefore Theorem \T\ holds and the optimal 
strategy for the transformed costs is monotone in the posterior distribution. Note that Theorem 
|3]also says that the optimal strategy /i*(7r) is unchanged by this transformation. Thus Theorem 
CD holds for the original quickest detection costs, thereby proving Theorem [2l 

Assumption (Al)(ii) is natural for the stopping problem to be well defined. It says that if it 
was known with certainty that the target state e\ has been reached, then it is optimal to stop. For 
quickest time detection it holds trivially since C(ei, 0) < C(ir, u) for u e {1, . . . , L}, it e n(X). 

2) Assumption (A2): From the structure of transition matrix A in (T27T) . clearly (A2) holds 
automatically for the quickest detection problem. For numerous examples of TP2 transition 
matrices, see BH. Also, A does not need to have an absorbing state for Theorem Q] to hold. 

3) Assumption (A3): Numerous continuous and discrete noise distributions satisfy the TP2 
property, see [|9|]. Examples include Gaussians, Exponential, Binomial, Poisson, etc. Examples 
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of discrete observation distribution satisfying (A3) include binary erasure channels - see SeclVTl 
A binary symmetric channel with error probability less than 0.5 also satisfies (A3). 

4) Assumption (A4): In general Theorem Q] requires the costs C{e^ u) to be submodular. 
However, for the special case of quickest detection with optimal sampling, from Theorem [3] shows 
that only the measurement cost m(e i; u) needs to be submodular, i.e., m(ej, u + 1) — m(ej, u) is 
decreasing in i. This holds trivially if the measurement cost is independent of the state. 

5) Assumption (A5): This is a submodularity condition on the transition matrix. Since from 
(f27T) A Du \ 21 = and A Du \ 22 = A 22 u , clearly (A5) holds automatically for the quickest detection 
problem with optimal sampling. 

IV. Myopic Bounds to Optimal Strategy for multi-state Markov Chain 

This section considers the optimal sampling problem for multi-state Markov chains (X > 2). 
Recall that multi-state Markov chains can model PH-distributed change times (fl9l) in quickest 
detection problems. Theorems 0] and |5] are the main results of this section. They characterize the 
structure of optimal strategy /i*(7r) which is the solution of Bellman's equation (flTT) . 

Define the following ordering of two arbitrary transition matrices A^ and A^: 

A« £ A^ if < iJ + hme X. (29) 

The following are the main assumptions used in this section: 
(A6) The transition matrix satisfies A >z A 2 where the ordering y is defined in (|29l . 
(A7) There exist a positive constant a satisfying 

(i) a>f/(d(l-A 21 )) 

(ii) EST 1 Al \n + "(An - A D " +1 \ a ) decreasing in % = 2, . . . , X. 

(A6) and (A7) are discussed at the end of this section. (A6) and (A7) hold trivially for the two 
state Markov chain case (X = 2) with absorbing state when A is of the form (l27l) . Examples 
for X > 3 are given in SecfVTl 

A. Myopic Lower Bound to Optimal Policy 

For multi-state Markov chain observed in noise, determining sufficient conditions for the 
optimal strategy to have a monotone structure is an open problem. In this section we show that 
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the optimal sampling strategy /i*(vr) is lower bounded by a myopic strategy. Define the myopic 
strategy /x(7r) and myopic stopping set S_ by 

/x(7r) = argminC(7r, u) 

S = {ne U(X) : //(tt) = 0} = {tt G n(X) : C(tt, 0) < C(tt, u), u e {1, 2, . . . , L}} (30) 

So 5 is the set of belief states for which the myopic strategy declares stop. 
The following is the main result of this section. The proof is in Appendix O 



Theorem 4: Consider the sequential sampling problem of Secjll] with optimal strategy 
specified by (fT7l) . Then 

1) The stopping set S defined in (fl"8l) is a convex subset of the belief state space II (X). 

2) S_ C S where S_ is the myopic stopping set defined in (l30l) . 

3) Under (Al), (A2), (A3), (A6), the myopic strategy fi(ir) defined in (l30l) forms a lower 
bound for the optimal strategy /i*(vr), i.e., yU*(vr) > /x(7r) for all n E n(X) — S. 

4) If (A4) holds, then the myopic strategy is /u(7r) is increasing with n (with respect to 
the monotone likelihood ratio (MLR) stochastic order to be defined in Sec lV-CI) . 



The above theorem says that the myopic strategy nijx) comprising of increasing step functions 
lower bounds the optimal strategy /x*(7r). The myopic strategy specified by (|30l) is computed 
trivially on the simplex H(X). Therefore, the above theorem gives a useful lower bound /i(7r) 
for the optimal strategy /U*(7r) (which is intractable to compute). Also since /j,(tt) is sub-optimal, 
it incurs a higher cost compared to the optimal strategy. This cost associated with /z(7r) can be 
evaluated by simulation and forms an upper bound to the optimal achievable cost. 

B. Quickest Detection with Optimal Sampling for PH-Distributed Change Time 

We now use Theorem |4] to construct a myopic strategy that upper bounds the optimal strategy 
for quickest detection with sampling for PH-distributed change time. However, (Al) does not 
hold for the quickest detection costs C(n,u) in (|24)) To proceed, it is convenient to define the 
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following transformed cost C(tt,u) and myopic strategy 



C(vr,0) 



C(n, 0) + ade^A'it, 



(3D 



C(tt,u) 



C(tt, u) + ade^A'ir — ade^A 



I A iD u + l 



Ti, u & {1,2, ... , L}, 



//(tt) 



arg 



mm 

ue{i,2,...,L) 



C(n,u). 



It will be shown in the proof of the theorem below, that the optimal strategy for global objective 
(fl"5l) with these transformed costs C(ir,u) remains unchanged and is still fJ,*(n). 

The main result regarding quickest detection with optimal sampling for PH-distributed change 
times is as follows. The proof is in Appendix EJ 



Theorem 5: Consider the quickest detection optimal sampling problem for PH-distributed 
change time (X > 2) defined in Sec III-CI with costs in ((24)) and transformed costs (I3TT) . Then 

1) The optimal stopping set S (TT8l) is a convex subset of the belief state space II (X) and 
contains t\. 

2) The optimal stopping set is lower bounded by the myopic stopping set S_ in (|3Q|) . i.e., 



3) Under (A2), (A3), (A6), (A7), for tt e U(X) - S, the myopic strategy JI(tt) in CD) 
upper bounds the optimal strategy, i.e., p,(n) > fx*(7i) for all tt e n(X) — <S. Moreover, 
/i(7r) is increasing in n with respect to the MLR order. 

4) For the case of geometrically distributed change time (X = 2), (A2) and (A6) hold 
automatically. So if (A3) holds and a is chosen according to (A7)(i), then /i(7r) > 

for all 7r G fl(X). Also /2(7r) is decreasing in vr(l). 



Discussion: Theorem \5\ gives a lot of analytical mileage in terms of bounding the optimal 
strategy. Statements (1) characterizes the convexity of the stopping set and Statement (2) lower 
bounds S. Statement (3) asserts that the optimal sampling strategy fi* can be upper bounded for 
PH-distributed change times by the myopic strategy /L 

Statement (4) shows that for geometrically distributed change times, the bounds on /i* and S 
apply without requiring any assumptions apart from that on the observation distribution (A3). In 
particular, Statement (4) together with Theorem |2] say that the optimal strategy fi*(rr) is monotone 



S C S. 
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in 7r(l), and upper bounds for the threshold values 7r*, 7r|, . . . , n* L can be constructed using 
the myopic strategy p,(ir). Therefore these upper bounds can be used to initialize a stochastic 
optimization algorithm to estimate the thresholds of the optimal monotone policy. 

Discussion of Assumptions (A6) and (A7): (A6) is a sufficient to preserve monotone likelihood 
ratio (MLR) dominance of the belief state with a one-step predictor. MLR stochastic order > r 
is defined in Sec lV-CI (A6) ensures that if two belief states satisfy m > r 7r 2 , then the one-step 
ahead Bayesian predictor satisfies A'txi > r A'ir 2 - Theorem [9] in Sec JV-CI analyzes this and other 
stochastic dominance properties of Bayesian filters and predictors that are crucial to prove the 
results of this paper. 

(A7) is sufficient for the transformed cost C(ir,u) defined in (1311) to satisfy the following: 
—C(tt,u) satisfies (Al) and C(n,u) to satisfies (A4). The fact that C(n,u) is decreasing (since 
its negative satisfies (Al)), gives an upper bound by a proof completely analogous to Theorem|4] 

V. Performance and Sensitivity of Optimal Strategy 

In previous sections, we have presented structural results on monotone optimal strategies. 
In comparison, this section focuses on achievable costs attained by the optimal strategy. This 
section presents two results. First, we give bounds on the achievable performance of the optimal 
strategies by the decision maker. This is done by introducing a partial ordering of the transition 
and observation probabilities - the larger these parameters with respect to this order, the larger 
the optimal cost incurred. Thus we can compare models and bound the achievable performance 
of a computationally intractable problem. Second, we give explicit bounds on the sensitivity of 
the total sampling cost with respect to sampling model - this bound can be expressed in terms 
of the Kullback-Leibler divergence. Such a robustness result is useful since even if a model 
violates the assumptions of the previous section, as long as the model is sufficiently close to a 
model that satisfies the conditions, then the optimal strategy is close to a monotone strategy. 

A. How does total cost of the optimal sampling strategy depend on state dynamics? 

Consider the optimal sampling problem formulated in SecHO How does the optimal expected 
cost Jfj* defined in (TT6l) vary with transition matrix A and observation matrix Bl In particular, 
is it possible to devise an ordering for transition matrices and observation distributions such 
that the larger they are, the smaller the optimal sampling cost? Such a result would allow us to 
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compare the optimal performance of different sampling models, even though computing these 
is intractable. Moreover, characterizing how the optimal achievable cost varies with transition 
matrix and observation distribution is useful. Recall in quickest detection the transition matrix 
specifies the change time distribution. In sensor scheduling applications the transition matrix 
specifies the mobility of the state. The observation matrix specifies the noise distribution. 

Consider two distinct models 9 = (A, B) and 9 = (A, B) of the optimal sampling problem 
where A, A are transition matrices and B, B are observation matrices. Let fi*(9) and fi*(9) denote, 
respectively, the optimal strategies for these two different models. Let J A1 *(6»)(vr; 9) = V(ir; 9) and 
J M *(g)(7r; 9) = V(tt; 9) denote the optimal value functions in (fTTT) corresponding to applying the 
respective optimal strategies. Recall also that the costs in (fl"5l) depend on the transition matrix 
A. So we will use the notation C(ir, it; 9) and C(ir, it; 9) to make the dependence of the cost on 
the transition matrix explicit. 

Introduce the following reverse Blackwell ordering ||2TT| of observation distributions. Let B and 
B denote two observation distributions defined as in §5$ . Then B reverse Blackwell dominates 
B denoted as 

B hB B if B = BR (32) 

where R = {Rim) is a stochastic kernel, i.e., ^ m Ri m = L This means that B yields more 
accurate measurements of the underlying state than B. 

The question we pose is: How does the optimal cost J fJi *^)(7r;9) vary with transition matrix 
A and observation distribution B? For example, in the quickest detection optimal sampling 
problem, do certain phase-type distributions result in larger total optimal cost compared to other 
phase-type distributions? 



Theorem 6: Consider two optimal sampling problems with models 9 = (A, B) and 9 = 
(A,B), respectively. Assume A >z A with respect to ordering <|29| ), B B with respect 
to ordering (l32l) . and (Al), (A2), (A3) hold. Then the expected total costs incurred by the 
optimal sampling strategies satisfy J M *(e)(7r; 9) > J M *(g)(7r; 9). That is, the larger the transition 
matrix and observation matrix (with respect to the partial ordering ( |29| ) and d32l)). the lower 
the expected total cost of the optimal sampling strategy. 
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The proof is in Appendix [0 Computing the optimal strategies of a POMDP and therefore 
optimal costs is intractable. Yet, the above theorem allows us to compare these optimal costs for 
different transition and observation matrices. The implication for quickest detection with optimal 
sampling is that we can compare the optimal cost for different PH-distributed change times and 
noise distributions. The implication for sensor scheduling is that we can apriori say that certain 
state dynamics incur a larger overall cost compared to other dynamics and noise distributions. 

As a trivial consequence of the theorem, the optimal cost incurred with perfect measurements 
is always smaller than that with noisy measurements. Since the optimal sampling problem with 
perfect measurements is a full observed MDP (or equivalently, infinite signal to noise ratio), the 
corresponding optimal cost forms a easily computable lower bound to the achievable cost. 

Examples: Here are examples of transition matrices A, A that satisfy (A3) and Ay A. 



Example 1. Geometric distributed change time: A 
where A 22 < A 22 . 



1 

1 — A 22 A 22 



A 



I- A 



22 





A 22 





1 







1 







Example 2. PH-distributed change time: A = 


0.5 0.3 


0.2 


,A = 




0.9 0.1 






0.3 0.4 


0.3 




0.8 0.2 








0.2 


0.8 




0.8 


0.2 


Example 3. Markov chain without absorbing state: A = 
















0.1 


0.9 




0.7 


0.3 



Theorem [6] applies to all these examples implying that the total cost of the optimal policy 
with A is larger than that with A. 



B. Sensitivity to Mis-specified Model 

How sensitive is the total sampling cost to the choice of sampling strategy? Given two distinct 
models 9 = (A, B) and 9 = (A, B) of the optimal sampling problem, Theorem [6] above compared 
their optimal costs - it showed that 9 y 9 =>- J^^{tx;9) < J^*^(ir;9), where p*(9),fx*(9) 
denote the optimal sampling strategies for models 9,9, respectively, (where the ordering y is 
specified in Sec JV-AI) . In this section, we establish the following type of sensitivity result (where 
the norm || ■ || is defined in Theorem [7J below): 

sup \J,* { e)(Tr;9)-J,, {e) (TT;9)\<K\\9-9\\ (33) 
7ren(x) 

and, we will give an explicit representation for the positive constant K in Theorem [7J below. 
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Note the key difference between (|33l) and Theorem [61 In (|33l) , we are applying the optimal 
strategy fi*(9) for model 9 to the decision problem with a different model 9. Of course, this 
results in sub-optimal behavior. But we will show that if the "distance" between the two models 
9, 9 is small, then the sub-optimality is small - that is, the increase in total cost by using the 
strategy /i* (9) to the decision problem with model 9 is small. 

Define 

y* eS = ini{y : (C s -C ) / r(e x , y, u;9)<0 and (C a -Co)'T(e x , y, u; 6) < Vu, u G {1, 2, . . . , L}} 

(34) 

The set depicted in (l34l) represents a subset the observation space Y for which the optimal 
decision is to stop. We assume that 
(A7) P{y < y y > 0. 

Assumption (A7) holds trivially if the observation distribution B xy (defined in ©) is absolutely 
continuous with respect to Lebesgue measure on R, i.e., if the density has support on R such as 
Gaussian noise. (A7) is relevant for cases when the observation space is finite or a subset of R. 



Theorem 7: Consider two optimal sampling problems with models 9 = (A, B) and 9 = 
(A, B), respectively. Let J At *(e)(7r; 9) and J M .(e)(vr; 9) denote the total costs (fT5l) incurred by 
these models when using strategy n*(9). Assume 9 and 9 satisfy (A2), (A3), (A4), (A7). 
Then the difference in the total costs is upper-bounded as: 

sup |J M «(0)(7r;0) - ./ M «(0)(7r;0)| < maxC(ei,0)- (35) 

Tren(x) 1 1 - Pe,e 

where p e § = max \^ a(e x ,y,u;9), \\9 — 9\\ = maxN^ \Bj y A Du \ij — Bj y A Du \ij \ . 

u ' * i,u ' * 

y - y e,e i' y 



Corollary 1: Consider two optimal sampling problems with models 9 and 9, respectively. 
Assume they have identical transition matrices, but different observation distributions denoted 
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B, B (where B is defined in ©. Then bound (1351) holds with 

\\6-e\\ = v / 2max^A D ^ i [D(B j \\B j )] 1/2 

j 

where D(Bj\\Bj) = Bj y hi(Bj y / Bj y ) (Kullback-Leibler Divergence). (36) 

y 

In particular, if the observation distributions are Gaussians with variance a 2 , a 2 , respectively, 
then d35]) holds with 

6-6 = (- -In- - 1 

\a a 



The proof of Theorem [7J and Corollary Q] are in Appendix [Gl Corollary [T]follows from Theorem 
[7J via elementary use of the Pinsker inequality that bounds the total variation norm by the 
Kullback-Leibler Divergence. Note that the bound in (1351) and (l36l) is tight in the sense that 
|| — 6*|| =0 implies that the performance degradation is zero. The proof is complicated by the 
fact that there is no discount factoo in the cost (fl"5l) . However, because the sampling problem 
terminates with probability one in finite time, it has an implicit discount factor - this is typical 
in stochastic shortest path problems that terminate in finite time flU. Assumption (A7) implies 
that p e Q < 1. The term 1 — p g g can be interpreted as a lower bound to the probability of stopping 
at any given time. Since this is non-zero, the term p e q in (|35l) serves as this implicit discount 
factor. 

The above result is more than an intellectual curiosity. For optimal sampling problems where 
the transition matrix or observation distribution do not satisfy assumptions (A5), (A6) or (A7) 
but are e close to satisfying these conditions, the above result ensures that a monotone strategy 
yields near optimal behavior, with explicit bound on the performance given by (1351) and (l36l) . 



4 Instead of d!5t . if the cost was J M (7ro) = E^ p k C(iVk,Uk) + p k C(iik* ,v-k* =0)1, where the user defined 

discount factor p € [0, 1), then establishing a bound such as d35b is straightforward. An artificial discount factor p is un-natural 
in our problem and un-necessary as shown in Theorem [7] since the problem terminates in finite time with probability one and 
hence has an implicit discount factor denoted as p e g. 
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C. Stochastic Dominance Properties of the Bayesian Filter 

This section presents structural properties of the Bayesian filter (flOl) which determines the 
evolution of the belief state ir. Indeed, the proofs of Theorems [T||7] presented in previous sections 
depend on Theorem [9] given below. The results in Theorem [9] are also of independent interest 
in Bayesian filtering and prediction. To compare posterior distributions of Bayesian filters we 
need to introduce stochastic orders. We first start with some background definitions. 

1 ) Stochastic Orders: In order to compare belief states we will use the monotone likelihood 
ratio (MLR) stochastic order. 

Definition 1 (MLR ordering, /17~7l/): Let 7ri,7r 2 G II (X) be any two belief state vectors. Then 
7Ti is greater than ix 2 with respect to the MLR ordering - denoted as Hi > r i\ 2 , if 

7Ti(i)7r 2 (i) < 7r 2 (z)7ri0'), i < e {1,.. .,X}. (37) 
Similarly 7Ti < r tt 2 if < in (1371) is replaced by a >. 

The MLR stochastic order is useful since it is closed under conditional expectations. That is, 
X > r Y implies EjXlJ 7 } > r EjFl^} for any two random variables X, Y and sigma-algebra 
T EH, 0, H26], El. 

Definition 2 (First order stochastic dominance, [17]): Let 7Ti, 7r 2 G II (X). Then 7Ti first order 
stochastically dominates n 2 - denoted as i\i > s n 2 - if 5_) i=J - tti (i) — J2f=j ^(i) for j = 
1,...,X. 

The following result is well known [TTTll . It says that MLR dominance implies first order 
stochastic dominance and gives a necessary and sufficient condition for stochastic dominance. 

Theorem 8 (JUTty): (i) Let tti,tt 2 G IL(X). Then iri > r tt 2 implies ix\ > s ix 2 . 
(ii) Let V denote the set of all X dimensional vectors v with nondecreasing components, i.e., 
v\ < v 2 < ■ ■ ■ v X - Then 7Ti > s tx 2 iff for all v G V, v'tri > v'tt 2 . 

For state-space dimension X = 2, MLR is a complete order and coincides with first order 
stochastic dominance. For state-space dimension X > 2, MLR is a partial order, i.e., [IT(X), > r ] 
is a partially ordered set (poset) since it is not always possible to order any two belief states 

7T G U(X). 

2) Main Result: With the above definitions, we are now ready to state the main result regarding 
the stochastic dominance properties of the Bayesian filter. 
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Theorem 9: The following structural properties hold for the Bayesian filtering update 
T(n, y, u) and normalization measure a(ir, y, u) defined in (fTOl) : 

1) Under (A2), tti > r 7r 2 implies T(7t 1 ,i/,m) > r T(ir 2 ,y,u) holds. 

2) Under (A2) and (A3), i\\ > r tt 2 implies the normalization measure satisfies cr(7ri, •, u) > s 

3) Under (A3) and (A5-(i)), the normalization measure a(ir,-,u) satisfies the following 
submodular property: 

^2 t "^' V' u+l )- °"( 7r ' i/» U )} ^ [ ff (* ' 2/' u + X ) _ cr ( 7r ' Vi u )\ for ff * 

4) For y,y E Y, y > y implies T(iri,y,u) > r T(iri,y,u) iff (A3) holds. 

5) Consider the ordering of transition matrices Ay A defined in (|29l) . 

a) If A y A then A'n > r A'ir, that is, the one-step Bayesian predictor with transition 
matrix A MLR dominates that with transition matrix A. 

b) If A>z A and (A2) holds, then (A l )'ir > r (A l )'ir for any positive integer /. That is, 
the /-step Bayesian predictor preserves this MLR dominance. 

6) Let T(ir,y,u; A) and a(ir, y, u; A) denote, respectively, the Bayesian filter update and 
normalization measure using transition matrix A. Then they satisfy the following 
stochastic dominance property with respect to the ordering of A defined in (|29|) : 

a) A y A implies T(tt, y, u; A) > r T(ir, y, u; A). 

b) Under (A3), Ay A implies a(n, u; A) > s a(ir, -, u; A). 



In words, Part 1 of the theorem implies that the Bayesian filtering recursion preserves the MLR 
ordering providing that the transition matrix is TP2 (A2). Part 2 says that the normalization mea- 
sure preserves first order stochastic dominance providing (A2) and (A3) hold. Part 3 shows that 
the normalization measure is submodular. This is a crucial property in establishing Theorem [Q 
Part 4 shows that under (A3), the larger the observation value, the larger the posterior distribution 
(wrt MLR order). Part 5 shows that if starting with two different transition matrices but identical 
priors, then the optimal predictor with the larger transition matrix (in terms of the order introduced 
in d29l) ) MLR dominates the predictor with the smaller transition matrix. Part 6 says that same 
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{D U D 2 ,D 3 ,D 4 } = {1,3,5,10}, A 



1 






, B = 


0.1 0.9 





thing about the filtering recursion T(ir,y,u) and the normalization measure cr(7i,y,u). 

VI. Numerical Examples 

Example 1. Optimal Sampling Quickest Detection with Binary Erasure Channel measurements: 
Consider X = 2, Y = 3, L = 5, / = 17, d = 0.4, mfe, 1) = 0, m(e l; 2) = 2.8, 

0.3 0.7 
0.2 0.8 

The noisy observations of the Markov chain specified by observation probabilities B models 
a binary non-symmetric erasure channel j6]|. Note that a binary erasure channel is TP2 by 
construction (all second order minors are non-negative) and so (A3) holds. 

The optimal strategy was computed by forming a grid of 1000 values in the 2-dimensional unit 
simplex, and then solving the value iteration algorithm (l38l) over this grid on a horizon N such 
that sup^ |Vjv(7t) — Vjv— i(tt) | < 10~ 6 . Figure (2 a) shows that when the conditions of Theorem Q] 
are satisfied, the strategy is monotone decreasing in posterior vr(l). To show that the sufficient 
conditions of Theorem \T\ are useful, Figure \2b) gives an example of when these conditions do 
not hold, the optimal strategy is no longer monotone. Here m(ej, 1) = 2.8, m(ej,2) = and 
therefore violates (Al) of Theorem [2l 

Example 2. Optimal Sampling Quickest Detection with Gaussian noise measurements: Here we 
consider identical parameters to Example 1 except that the observation distribution is Gaussian 
with B\ y ~ N(l, 1), B 2y ~ iV(2, 1) and measurement costs are m(ei,u) = 1 for all i £ 
X, u £ {1, 2, 3, 4}. Since the measurement cost is a constant (Al) and (A4) of Theorem |2] hold 
trivially. As mentioned in Sec JIII-Bl (A3) holds for Gaussian distribution. Therefore Theorem 
|2] applies and the optimal strategy ^*(tt) is monotone decreasing in vr(l). Fig|2] illustrates the 
optimal strategy. Next, using Theorem |5l the myopic strategies £l(ty) forms an upper bound to 
the optimal strategy /i*(7r) for actions u £ {1, ... ,4}. We used a = f /(d(l — A 2 \)) to satisfy 
(A7)(i) for the myopic cost in (|3TT) . As a bound for the optimal stopping region, we used the 
myopic stopping set S_ defined in (l30l) . These are plotted in Fig 12 a). 

Example 3. Optimal Sampling Quickest Detection with Markov Modulated Poisson measure- 
ments: The parameters here are identical to Example 2 except that the observations are generated 
by a discrete time Markov Modulated Poisson process. That is, at each time k, observations are 
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h(tt) 



0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

7T(1) 



0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

7T(1) 



(a) Gaussian Observation Probabilities 



(b) Poisson Observation Probabilities 



Fig. 2. Optimal sampling strategy for action u £ {0 (announce change), 1, 2, 3, 4} for a quickest-change detection problem 
with geometric change time. The parameters are specified in Example 2 and 3 in Sec lVIl The optimal strategy /i* (n) is monotone 
decreasing in 7r(l) and is upper bounded by myopic strategy jl according to Theorem[5] 



where the rates Ai = 1, 



generated according to the Poisson distribution B xy 
A2 = 1.5. Since (A3) holds for Poisson distribution, Theorem [2] applies. Figf2b) illustrates the 



optimal strategy. As in Example 2, the myopic strategy /x(7r) forms an upper bound. 

Example 4. Optimal Sampling with Phase-Distributed Change Time: Here we consider optimal 
sampling quickest detection with PH-distributed change time. Consider a 3-state (X = 3) Markov 
chain observed in noise with parameters / = 10, d = 0.4, m{e^u) = 1, 



, {D\, D 2 , D 3 , D4} = {1, 2, 4, 5}. 



1 0.8 0.2 

4 = 0.7 0.3 ,B= 0.1 0.8 0.1 
0.3 0.4 0.3 0.1 0.9 

So n(X) is a 2-dimensional unit simplex. The optimal strategy was computed by forming a grid 
of 8000 values in the 2-dimensional unit simplex, and then solving the value iteration algorithm 
d38l over this grid on a horizon such that sup^ \Vn(tt) — Vjv_i(7r)| < 10^ 6 . FigJ^a) shows 
the optimal strategy. 

It can be verified that the transition matrix A satisfies (A3), (A6) and (A7) for a = 100. 
Also the observation distribution B satisfies (A2). Therefore Theorem [5] holds and the optimal 
strategy is upper bounded by the myopic strategy ^(71") defined in (lUI) . Fig. [3tb) shows the 
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(a) Optimal Policy jtt*(7r) (b) Myopic Upper Bound fi(n) 



Fig. 3. Optimal sampling strategy for action u 6 {0 (announce change), 1, 2, 3, 4} for a quickest-change detection problem 
with PH-distributed time specified by 3-state Markov chain in Example 4 of Sec lVIl The belief space is a two dimensional 

unit simplex (equilateral triangle). The optimal strategy is upper bounded by myopic strategy p,(n) according to Theorem [5] 

myopic strategy P>(tt). As a bound for the optimal stopping region, we used the myopic stopping 
set £ defined in (l30l) . In Figj3jb) these are represented by '0'. 

VII. Discussion 

The paper presented structural results for the optimal sampling strategy of a Markov chain 
given noisy measurements. An example dealing with quickest change detection with optimal 
sampling was discussed to motivate the main results. Such problems are instances of partially 
observed Markov decision processes (POMDPs) and computing the optimal sampling strategy 
is intractable in general. However, this paper shows that under reasonable conditions on the 
sampling costs, transition matrix and noise distribution, one can say a lot about the optimal 
strategy and achievable cost using tools in stochastic dominance and lattice programming. There 
main results were: Theorems Q] and [2] gave sufficient conditions for the existence of a monotone 
optimal sampling strategy (with respect to the posterior distribution) when the underlying Markov 
chain had two states. It justified the intuition that one should make measurements less frequently 
when the underlying state is away from the target state. Theorem |4] and Theorem [5] gave sufficient 
conditions for the myopic sampling strategy to form a lower bound or upper bound to the optimal 
sampling strategy for multi-state Markov chains. Theorem [6] gave a partial ordering for the 
transition matrix and noise distributions so that the expected cost of the optimal sampling strategy 
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decreased as these parameters increased. This yields useful information on the achievable optimal 
cost of an otherwise intractable problem. Theorem [/J gave explicit bounds on the sensitivity of the 
total sampling cost with respect to sampling strategy in terms of the Kullback Leibler divergence 
between the noise distributions. Theorem [9] gave several useful structural properties of the optimal 
Bayesian filtering update including sufficient conditions that preserve monotonicity of the filter 
with observation, prior distribution, transition matrix and noise distribution. 

The assumptions (A1-A7) used in this paper are set valued; so even if the precise parameters 
(transition probabilities, observation distribution, costs) are not known, as long as they belong 
to the appropriate sets, the structural results hold. Thus the results have an inherent robustness. 

Finally, it is interesting to note that the results derived in this paper on sampling control do 
not apply to general measurement control problems where the action affects the observation 
distribution rather than transition kernel. The reason is that it is not possible to find two non- 
trivial stochastic matrices (kernels) B and B such that the belief updates satisfy (i) T(ir, y\ B) > r 
T(tt, y; B) and normalization measure satisfies (ii) a(7i, ■, B) > s a(n, •; B). In ifToll . it is claimed 
that if B TP2 dominates B then (i) and (ii) hold. However, we have found that the only examples 
of stochastic kernels that satisfy the TP2 dominance are the trivial example B = B. In our paper, 
which deals with sampling control, the ordering (|29| ) was constructed so that two transition 
matrices A and A satisfy (i) and (ii) with B, B replaced by A, A. This ordering was used in 
Assumption (A6). 



A. Value Iteration Algorithm 

The proof of the structural results in this paper will use the value iteration algorithm (7]|. Let 
n — 1,2, ... , denote iteration number. The value iteration algorithm proceeds as follows: 



Let B(X) denote the set of bounded real- valued functions on Tl(X). Then for any V and V G 
B(X), define the sup-norm metric sup ||V(7r) — t^(7r)||, ix G n(X). Then B(X) is a Banach space. 
The value iteration algorithm (1381) will generate a sequence of value functions {V^} C B(X) 



Appendix 



K+iO) = minQ n (ir,u), where Q n (n,u) = C(ir,u) + \ V n (T(ir, y,u)) a(n,y,u), 




and Q n (7i, 0) = C(ir, 0) initialized by V q (tt) = 0. 



(38) 
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that will converge uniformly (sup-norm metric) as k — > oo to V(ir) G B(X), the optimal value 
function of Bellman's equation. However, since the belief state space n(X) is an uncountable 
set, the value iteration algorithm (|38l) do not translate into practical solution methodologies as 
\4(7r) needs to be evaluated at each tt G n(X), an uncountable set. Nevertheless, the value 
iteration algorithm provides a natural method for proving our results on the structure of the 
optimal strategy via mathematical induction. 

B. Proof of Theorem [7] 

To prove the existence of a monotone optimal strategy, we will show that Q(n,u) in (fTTT) is 
a submodular function on the poset [Il(X),> r ]. Note that [n(X),> r ] is a lattice since given 
any two belief states m, 7T2 G U(X), sup{7r : 7r < r 7Ti, 7r < r 112} and inf{7r : n > r 7Ti, tt > r 712} 
lie in H(X). For X = 2, n(X) is the unit interval [0,1] and in this case [n(X), > r ] is a chain 
(totally ordered set). 

Definition 3 (Submodular function /l25l/): / : n(X) x {1,2} — > R is submodular (antitone 
differences) if f(n,u) — f(7i,u) < f(ir, u) — f{TT,u), for u < u, n > r n. 

The following result says that for a submodular function Q(tt,u), fi*(ir) = argmin u Q(ir, u) 
is increasing in its argument tt. This will be used to prove the existence of a monotone optimal 
strategy in Theorem CD 

Theorem 10 (/|25|/): If / : n(X) x U — > R is submodular, then there exists a fi*(ir) = 
argmin ugW f(n, u), that is MLR increasing on n(X), i.e., n > r tt fJ,*(n) < ■ 
Finally, we state the following result. 

Theorem 11: The sequence of value function {V n (n), n = 1,2, . . .}, generated by the value 
iteration algorithm (1381) . and optimal value function V(ir) defined in (flTl) satisfy: 

(i) V n (ir) and V(ir) are concave in ix G /. 

(ii) Under (Al), (A2), (A3), V n (7i) and V(tt) are increasing in n with respect to the MLR 
stochastic order on n(X). ■ 
Statement (i) is well known for POMDPs, see flSj for a tutorial description. Statement (ii) is 
proved in |[T6l Proposition 1] using mathematical induction on the value iteration algorithm. 

Proof: With the above preparation, we present the proof of Theorem CD 
The first claim follows from the general result that the stopping set S for a POMDP is always 
a convex subset of n(X) - see Theorem HI Of course, a one dimensional convex set is an interval 
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and since e x G 5, it follows that the interval S — (tt*, 1]. 
In light of the first claim, the optimal strategy is of the form 

vr e S 

argmin u6{12 ii} Q(vr, u) Tie U(X) - S 
So to prove the second claim, we only need to focus on belief states in the interval n(X) — 
S = [0, 7Ti] and consider actions u G {1, 2, . . . , L}. To prove that ^*(tt) is MLR increasing in 
7r G n(X) — S, from Theorem [lOl we need to prove that Q(n,u) is submodular, that is 

Q(ir, u) — Q(n, u) — Q(tt, u) + Q(n, u) < 0, u > u, n > r if. 

From (flTT) . the left hand side of the above expression is 

C(tt, u) - C(tt, u) - C(tt, u) + C(tt, u) 

a(ir,y,u) a{n,y,u) 
y y 

+ ^ [V(T(tt, y, u)) - V(T(ir, y, u))} a(ir, y, u) (39) 
y 

Since the cost is submodular by (A4), the first line of (1391) is negative. Since V(ix) is MLR 
increasing from Theorem [TT] and T(ir, y, it) is MLR increasing in y from Theorem H£4), it 
follows that V(T(ir, y, u)) is MLR increasing in y. Therefore, since a(rr, -,u) is submodular 
from Theorem [9^3), the second line of (l39l) is negative. 

It only remains to prove that the third and fourth lines of (l39l) are negative. From statements (1) 
and (6) of Theorem |9l it follows that T(n, y, u) > r T(n, y, u) > r T(tt, y, u) and T(n, y, u) > r 
T(n, y, u) > r T(n, y, u). Now we use the assumption that X = 2. So the belief state space n(X) 
is a one dimensional simplex that can be represented by 7r(2) G [0, 1]. So below we represent 
7r, T(7r,y,w), etc. by their second elements. Therefore using concavity of V(-), we can express 
the last two summations in (l39l as follows: 



\r (rpi \ \ Trfrpf -\\^imf \ rp, -\i V(T(tt, y, u)) — V(T(tt, y, u)) 

V{T[7r, y, u)) - V{T{tt, y, u)) < T(tt, y, u) - T(tt, y, u)\ — - — — r 

T{7r,y,u) - T(ir,y,u) 

V(T(7t, y, «)) - V(T(k, y, «)) < ffi V > U \ - T{ fj y ' U \ [V(T(tt : y, «)) - V(T(7r, y, u))] 

+ y(T(7f,2/,«))-y(T(7r,j/,«)) 
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Using these expressions, the summation of the last two lines of ( 1391 ) are upper bounded by 

J2[V(T(n,y,u))-V(T(7r,y,u))} 



a{n, y, u) + — r — — ra{w, y, u) 



T(n,y,u) - T(n,y,u) 

T(n,y,u) -T(n,y,u) 1 

+ —, r r(T 7T, y, u) (40) 

T{n,y,u) -T(7r,y,u) 

Since V(tt) is MLR increasing (Theorem ITTb and T(ir,y,u) > r T(ir,y,u) (using the fact that 
7r > r 7f and Statement 1 of Theorem |9]), clearly V(T(7r, y, u)) — V(T(tt, y, it)) > 0. The term in 

square brackets in (l40l) can be expressed as (see [QQ|) 

B 2y B ly (7i - n)(A D *\ 22 - A D *\ l2 - A D ^\ 22 - A D -\ l2 ) 
0-0, V, u) [T(vr, j/, u) - T(vr, y, u)] 
By Assumption (A5)(ii) the above term is negative. Hence (|39l ) is negative, thereby concluding 

the proof. 

C. Proof of Theorem \3\ 

Statement 1: Consider Bellman's equation (flTT) and define V_(tt) = V(tt)— aC{ix, L). It is easily 
checked that V_(n) satisfies Bellman's equation with costs C(ir,u) replaced by C_(n,u) defined 
in (T28l) . Also since the term being subtracted, namely, aC(n,L) is functionally independent of 
the minimization variable u, the argument of the minimum of (ITTb . which is the optimal strategy 
yU*(7r), is unchanged. 

Statement 2: Since our aim is to transform the delay cost to yield a MLR increasing submodular 
transformed cost, for notational convenience assume the measurement cost m(x, u) = 0. From 
its definition in (|28l ), straightforward computations yield that the transformed cost is 

C(ir, 0) = fn - ade[(I + A + ■ ■ ■ A Dl ~ 1 )-k 

C(tt, u) = de[ ((1 - a)I + olA Dl ) (I + A + ■ ■ ■ + A ^ 1 )'^, u e {1, 2, . . . , L} 

So clearly for a > 0, C(ei, 0) < C_(e 2 , 0), and so C(ir, 0) is MLR increasing. 

We now give conditions for C_(tt, u), for u £ {1, 2, . . . , L} to be MLR increasing in ir e fl(X). 



By (A2), (I + A 



+ A 



D u -1\/ 



71 



is MLR increasing in n E fl(X). So for C(tt,u) to be 



MLR increasing in ir, it suffices to choose a so that the elements of ((1 — a) I + aA DL ) e± are 
increasing. Given the structure of A in (ITTT) . it follows that 

1 



((1 - a)I + aA DL ) &y 



a(l-A%) 
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So choosing a > 1/(1 — A 22 L ) is sufficient for C(ir,u) to be MLR increasing in n for u £ 
{1, 2, . . . , L} and therefore for u £ {0, 1, ... , L}. 

Next for the transformed cost C(tt, u) to be submodular for u £ {1,2,..., L}, we require 
C_(n,u + 1) — C_(tt,u) to be MLR decreasing in n. Straightforward computations yield for 

u£{l,2,...,L}, 

C(ir, u + 1)- C(7r, u) = de[ ((1 - a) I + aA° L )' (A Du + A Du+l + ■■■ + A D " +1 )V 

So for C(ir, u) to be submodular, it suffices to choose a so that the elements of ((1 — a) I + aA° L ) e\ 
are decreasing, i.e., a < 1/(1 — A^). 

Therefore choosing a = 1/(1 — A 22 L ) is sufficient for the transformed cost C_(ir, u) to be 
both MLR increasing for u £ {0, 1, . . . , L} and submodular for u £ {1, 2, . . . , L} on the poset 

[n(x)-s,> r ]. 

Z). Proof of Theorem |?] 

Statement 1: The proof of convexity of the stopping set S follows from arguments in [TT5l . 
We repeat this for completeness here. Pick any two belief states 7Ti,7r 2 £ S. To demonstrate 
convexity of S, we need to show for any A £ [0, 1], \ni + (1 — X)n 2 £ S. Since V(ir) is concave 
(by Theorem [TT] above), it follows from (flTI) that 

V(Atti + (1 - A)vr 2 ) > AF(tti) + (1 - X)V{tx 2 ) 

= AQ(tti, 0) + (1 - A)Q(vr 2 , 0) (since ir u vr 2 £ 5) 

= <3(A7Ti + (1 — A)7r 2 , 0) (since Qi(tr, 0) is linear in ir) 

> ^(ATri + (1 — A)7r 2 ) (since V(ix) is the optimal value function) (41) 

Thus all the inequalities above are equalities, and A7ri + (1 — A)?t 2 £ S. 

Statement 2: Since the costs C(tc,u) are non-negative, so is V(ir) in (flTI) . So from (flTT).. 
C(tt,0) < C(ir,u) =^ Q(?r,0) < Q(tt,u) =^ tt £ 5. Therefore ScS. 

Statement 3: The proof is similar to [L6, Proposition 2] with the important difference that in 
lfT6l the TP2 ordering of transition matrices is used instead of (A6). However, the TP2 ordering 
(see ifToll for definition) does not yield any non-trivial example. 

Since D u < D u+1 , (A6) implies A Du y A Du+1 . So by Statement 6(a) of Theorem [H for 
u > u, T(n,y,u) < r T(n,y,u). By Theorem [HI V(ix) is MLR increasing in n. Therefore 
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V(T(7r,y,u))<V(T(ir,y,u)). So 

^2 V(T(7r, y, M)cr(vr, y, tt) < ^ v ( T ( n i V, u)a(n, y,u), u > u. 
v y 

Since T(ir,y,u) is MLR increasing in y (Statement 4 of Theorem [9]) and V(ir) is MLR in- 
creasing in 7T, clearly V(T(7r, y, «)) is increasing in y. Also (A6) implies A° u y A D u+i an( j so 
cr(7r, u) < s a(ir, -, u) from Statement 6(b) of Theorem [9J So 

^2 v ( T ( n i y> u ) °"( 7r ' Vi u )<^2 v ( T ( n i v> u ) cr ( 7r ' *0> M > 

Therefore, V r (T(7r, y, u)<t(7T, w) < ^ V(T(7r, y, u)a(ir, y, u) which is equivalent to Q(ir, u) — 
Q(ir, u) < C(tt, u) — C(ir, u). Then lfT6l Lemma 2.2] implies that the minimizers of Q(n, u) — 
Q(ir,u) are larger than that of C(tt,u) — C(n,u). That is (J>*(ir) > /x(7r) for n e n(X). 

Statement 4: By (A4), C(ir,u) is submodular on the poset [n(X), > r ]. So using Theorem [lOl 
it follows that ^l(tt) is MLR increasing. 

E. Proof of Theorem \5\ 

Statements 1 and 2 follows directly from Theorem |4l 
Statement 3: We prove this in the following steps. 

Step 1. A**(7r) remains invariant with transformed cost: For costs C(ir,u) Bellman's equation 
yields the same optimal strategy h*(tt) as costs C(tt, u). To see this, consider Bellman's equation 
(flTT) and define V(tt) = V(n) + ade^A'n. It is easily checked that V(n) satisfies Bellman's 
equation with costs C(ir,u) replaced by C(tt,u) defined in (1281) . Also since the term being 
added, namely ade^A'ir is functionally independent of the minimization variable u, the argument 
of the minimum of (TT71) . which is the optimal strategy /U*(7r), is unchanged. 

Step 2. C(ir, u) is MLR decreasing: We show that (A7) implies that C(ir, u) is MLR decreasing, 
i.e., —C(ir,u) is MLR increasing and satisfies (Al). 

First consider C(ir,0). Note (5(61,0) = ad, and C(ei,0) = f + aA a for % > 1. So C(ei,0) > 
C(e 2 , 0) if a > f/(d(l - A 2 i)). Since a > and A is TP2 (Assumption A3), A a > A i+1A . So 
clearly C(e h 0) > C(e i+1 , 0) for i > 2. 

Next consider C(ir,u), u G {1, 2, . . . , L}. Note C(e\, u) = dD u and 

C(e t , u) = d(A a + A 2 \ tl + ■ ■ ■ + A^la) + ad(A 21 - A D - +l \ 21 ). (42) 
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Clearly C(e u u) < d(D u - 1) +ad(A a - A D " +1 \a). Also A a - A D ™ +1 \n < since A Du+1 \n = 
An + J2j>i MjA Du \,j\. Therefore for non-negative a, C(ei,u) > C(ei,u), i > 2. Also for 
C(ei,u) > C(e i+ i,u), i > 2, clearly from (|42l) it follows that (A7)(ii) is sufficient. 

Step 3: C(n,u), u > 1, is submodular (satisfies (A4)). This follows similar to Step 2. 

Step 4: With the above three steps, we can now apply Theorem 0], except that C(tt, u) is MLR 
decreasing instead of MLR increasing as required by (Al). By a very similar proof to Theorem 
HI it follows that Ji{tx) > 

Statement 4: Follows trivially from Statement 3 for the X = 2 case. 

F. Proof of Theorem |6| 

Part 1: We first prove that dominance of transition matrices A y A (with respect to (|29l) ) 
results in dominance of optimal costs, i.e., V(ir; A) > V(n;A). The proof is by induction. 
V (n; A) > Vo(7r; A) = by the initialization of the value iteration algorithm (1381) . Next, to prove 
the inductive step assume that V n (n; A) > V n (ir; A) for ir £ n(X). By Theorem [TTJii), under 
(Al), (A2), (A3), V n (ir;A) and V n (n;A) are MLR increasing in n e U(X). From Statement 
6(a) of Theorem [9l it follows that T(ir, y, u; A) > r T(ir, y, u; A). This implies 

V n (T(ir, y, u- A); A) > V n (T(n, y,u;A);A), Ah A. 

Since V n (ir; A) > V n (ir; A) \f% e U(X) by assumption, clearly V n {T(ir, y, u, A); A) > V n (T(ir, y, u, A); A). 
Therefore 

K(T(tt, y, u; A); A) > V n (T(ir, y, u; A);A)> V n (T(ir, y,u,A);A), Ah A. 

Under (A2), (A3), Statement 4 of Theorem [9] says that T(it, y, u; A); A) is MLR increasing 
in y. Therefore, V n (T(n, y, u; A); A) is increasing in y. Also from Statement 2 of Theorem [9l 

a(n, u; A) > s cr((7r, ■, u; A) for Ah A. Therefore, 

E V n(T(n, y, u; A); A)a(n, •, u; A) > ^ V n (T(v, y, u; A); A)a(ir, ; u; A). (43) 
y v 

Next, we claim that under (Al) and (A2), Ah A implies that C(n,u;A) > C(ir,u,;A). 
This follows since c(e u u) defined in (fT5l) has increasing components by (Al) and (A 1 ) 1 !! > r 
(A 1 )'tt (Statement 5(b), Theorem©. Therefore, c' u (A 1 )'tt > c' u (A l )'n implying that C(n,u; A) > 
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C(ir,u,A). This together with (|43l) implies 

C(tt, u; V n (T(ir, y, u; A); A)a(ir, •, u; A) > C(tt, u, ^)+^ K(T(vr, y, u; A); A)a(ir, •, u; 

j/ y 

Minimizing both sides with respect to action u yields V n+ i{iz\ A) > V n+ i(n; A) and concludes 
the induction argument. 

Part 2: Next we show that dominance of observation distributions B y B B (with respect to the 
order (|32l) ) results in dominance of the optimal costs, namely V(tt; B) > V(n, B). Let T(n, y, u) 
and T(ir, y, u) denote the Bayesian filter update with observation B and B, respectively, and let 
a(n, y, u) and a(7r, y, u) denote the corresponding normalization measures. 

Then for a E Y, 

T(vr, a,u)=^2 f(w, y, u) a ^' y ' U ': \ p(a\y) and a(ir, a, u) = ^ y, u)P(a\y). 

y€Y V ' ' ' y£Y 

Therefore, a\y) is a probability measure wrt y. Since from Theorem [TT1 V^(-) is concave 

for 7r G n(X), using Jensen's inequality it follows that 

V n (T{*,a,u);B) = V n fef(*,y,u ff' y ' U \ p{a\v);B) > '^j f 

a(7r,a,n) y ^ a(7T,a,u) 

implying ^ V;(T(7r, a, u); B)c(ir, a, u) > V^(f (tt, y, u); B)a(7r, y, u). (44) 

a j/ 

With the above inequality, the proof of the theorem follows by mathematical induction using 
the value iteration algorithm (|38T ). Assume V n (ii; B) > V n (iz\ B) for ir G fl(X). Then 



a, tt 



C(7r, w) + V n (T(ii, a, u); B)a(ir, a, u) > C(tt, u) + V^(T(7r, a, it); B)a(ir, i 

a a 

> (7(7?, u) + K(T(vr, y, u); B)a(7r, y, tt) 

2/ 

where the second inequality follows from (|44j) . Thus V n+ i(7r; 5) > V n+ i(n; B). This completes 
the induction step. Since value iteration algorithm (1381) converges uniformly, V(ir; B) > V(ir; B) 
thus proving the theorem. 

G. Proof of Theorem 

Define the set of belief states S = n u {n : (C u — Co)' 71 " > 0}. Clearly S C S. Let us 
characterize the set of observations such that the Bayesian filter update T(tt, y, u; 9) lies in <S 
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for any action u. Accordingly, define 

Knrfi = {V ■ {Cu ~ C )'T(ir, y, u; 8) > 0, Vu, u G {1, 2, . . . , L}} } = inf {y : y G Tl%. e }. 

(45) 

Here 1Z%. e denotes the complement of set IZ^e- 

Lemma 1: Under (A2),(A3),(A4), the following hold for TZ n . e and defined in ( |45l ): 
(i) K%. e = {y-y> y*, e }. (ii) tt > r tt 7^ c 7^. (iii) tt > r tt < 

Proof: The first assertion says that the set of observations for continuing is the set {y : 
V > yt-e}- By (A4), C u — Co has decreasing elements. Since T(%,y,u;8) is MLR increasing 
in y, clearly (C u — Co)'T(Tr,y,u;8) is decreasing in y. Therefore, there exists a y*. e such that 
y > Vn-e implies T(n,y,u]8) G ^.g. This proves the first statement. By (A4), C u — C has 
decreasing elements. By (A2), (A3), T(ir, y, u; 8) is MLR increasing in n. Therefore (C a — 
Co)'T(iT,y,u; 8) > (C a — Co)'T(ex,y,u; 8) which implies TZ n . e C Tl^-e- Statement (i) says that 
T(7r, w; #) is MLR increasing in y; statement (ii) says that TZ n -e C TZ^-e- Combining these 
yields y*. g < yl. e . ■ 

For notational convenience denote the optimal strategy fx* (9) as fi. From (fT3T) . the total cost 
incurred by applying strategy fi(ix) to model 8 satisfies at time n 

jW(tt; 9) = C; (7r) 7r + £ J^in-K, y, //(tt); 0)^, y, //(tt); (9) 

since for ?/ G TZ w -e, T(ir, y, li(tt); 9) G S and so V(T(7r, y, fi(ir); 9)) = 0. 
Therefore, the absolute difference in total costs for models 8, 8 satisfies 

+ E J £ n_1) ( T (^ y> MO; kfo y> MO; - ^ y> M^); 0) I 



< 



sup \J^\ix-8)-J^ l \ix-8)\ £ a(7r,y,/i(7r);0) 
+ sup Jf-Vfad) ^aiir^,^)^) - a(ir,y,fi(ir);9)\ (46) 
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We will upper bound the various terms on the RHS of (|46l ). Statement (i) of Lemma Q] yields 
11%.$ UTZ c n . s = {y > y* n . e § } where y*.^ = min(y;. e , Next Statement (iii) of Lemma Q] 
yields < Therefore, 

sup ijJ-^M)- 4 n_1) M)l E 'fav.M*);*) 

< sup IJ^^tt^)- /^(tt; 0)1 max V a(7r,y, M ;0) 

< sup IJ^M)- J^tt; 0)| max ^ ^ x ,y,u;9) 



where the last line follows since e x > s 7r, and so Statement 2 of Theorem [9] implies er(7r, •, u; 9) < s 
a(e x , m; 0). Also evaluating a(ir, y, yu(vr); 0) = l' x B y {A'Y^ / K defined in (|T0l) yields 

^2\a(ir,y,fi(ir);9) - cr(7T, y, /i(vr); 0) | < max ^ ^ ^ | B^y 4 " | {j - B jy A u \ ij \Tr(i) 
yeY j/ i j 

< max max | R„ A" | - A" | tj | (47) 

y j 

Finally, sup^gnpq J^ n_1 ^(7r;0) < max ieX C(ei, 0). Using these bound in (l46l) yields 

sup |jW(7r;0)-jW(7r;0")|<p sup | ^(tt; 0) - J<rV; 0)| + maxC( ei , O)||0 - S\\ 
7ren(x) 7ren(X) ieX 

(48) 

where p es = max„ V > « a(e x ,y,u;6) and ||0-0|| is given by d47]). Since max u V Y °{ & x,V, 
1, then (A7) implies p e g = niax u V > * a(e x ,y,u;6) < 1. Then starting with jj?\ir;6) = 
4 )(7r; 0) = 0, unravelling (g8]) yields ([33]). 

Proof of Corollary [U When and have identical transition matrices, then (|47T) becomes 



max max 

M i 

3 



From Pinsker's inequality (61, the total variation norm is bounded by Kullback-Leibler distance 
D defined in (l36l) as 
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H. Proof of Theorem [9] 

We quote the following result from (S), which adapted to our notation reads 
Theorem 12 Lemma 8.2, pp.382]): (i) Suppose p(-) and q(-) are integrable functions on 
Y and /(•) is increasing and non-negative. Then J2y f{v)p{v) < Ey f(y)q(y) iff J2 y > y P(v) > 

(ii) Suppose f is increasing for i e X and non-negative. Then for arbitrary vectors p, q 6 M x , 

/'p > A iff Ej>yPj > Ej>] Qj for all j E X ■ 
The above theorem is similar to Statement (ii) of Theorem [8] with some important difference. 
Unlike Theorem [H p and q need not be probability measures. On the other hand, Theorem [8] 
does not require / to be non-negative. 

Proof of Theorem |9j Statements 1, 2 and 4 of the theorem are proved in lfT6l . 

Statement 3: Suppose n > r n. Then clearly (A5)-(i) implies that 

EE ( A ° u+1 \v - ADu \*) *(«') < EE - *«• 

j>q i j>q i 

Also (A3) implies that Y2 y > q Bjy ls increasing in j. Then applying Theorem [T27 i) yields 

E E B >y E 1« - ^ u W ^) < E E E (^ Du+1 1« - W *(*)■ 

j V>Q i j y>i i 

Statement 5(a): The proof is as follows: By definition A'n > r A'n is equivalent to 

E E { A V-A m j + l - AijA m j + i) 7liTT m < 0. 

iex mex 

Thus clearly (T2~9l) is a sufficient condition for AV > r -AV. 

Statement 5(b): Since A y A implies A'n > r A'tx it follows from (A2) that A' A'n > r 
A' A'n. Also Statement 4(a) implies A' A'n > r A' A'n. Since the MLR order is transitive, these 
inequalities imply A' A'n > r A' A'n. Continuing similarly, it follows that for any positive integer 

I, {A l )'n > r (A l )'n . 

Statement 6(a): This follows trivially since Bayes' rule preserves MLR dominance. That is 
n > r n implies n > r \? V b n - Since by Statement 4(a), A y A implies A'n > r A'n, applying 
the Bayes rule preservation of MLR dominance proves the result. 

Statement 6(b): (iii) Since A y A implies A'n > r A'n, it follows that A'n > s A'n. Next (A3) 
implies that J2 y > q B iy is increasing in i. Therefore £ ieX T, y > q B iy [A'n] (i) > £ ieX J2 y > g B iy [A'n] (i). 
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